Asynchronous Distributed Data Parallelism for Machine Learning
نویسندگان
چکیده
Distributed machine learning has gained much attention due to recent proliferation of large scale learning problems. Designing a high-performance framework poses many challenges and opportunities for system engineers. This paper presents a novel architecture for solving distributed learning problems in framework of data parallelism where model replicas are trained over multiple worker nodes. Worker nodes are grouped into worker groups which enable model replicas to be asynchronously aggregated via peer-to-peer communication. Merits of this framework include elastic scalability, fault tolerance, and efficient communication.
منابع مشابه
Distributed Machine Learning: Foundations, Trends, and Practices
In recent years, artificial intelligence has achieved great success in many important applications. Both novel machine learning algorithms (e.g., deep neural networks), and their distributed implementations play very critical roles in the success. In this tutorial, we will first review popular machine learning algorithms and the optimization techniques they use. Second, we will introduce widely...
متن کاملAsynchronous Decentralized Parallel Stochastic Gradient Descent
Recent work shows that decentralized parallel stochastic gradient decent (D-PSGD) can outperform its centralized counterpart both theoretically and practically. While asynchronous parallelism is a powerful technology to improve the efficiency of parallelism in distributed machine learning platforms and has been widely used in many popular machine learning softwares and solvers based on centrali...
متن کاملThesis Proposal Parallel Learning and Inference in Probabilistic Graphical Models
Probabilistic graphical models are one of the most influential and widely used techniques in machine learning. Powered by exponential gains in processor technology, graphical models have been successfully applied to a wide range of increasingly large and complex real-world problems. However, recent developments in computer architecture, large-scale computing, and data-storage have shifted the f...
متن کاملASAP: Asynchronous Approximate Data-Parallel Computation
Emerging workloads, such as graph processing and machine learning are approximate because of the scale of data involved and the stochastic nature of the underlying algorithms. These algorithms are often distributed over multiple machines using bulk-synchronous processing (BSP) or other synchronous processing paradigms such as map-reduce. However, data parallel processing primitives such as repe...
متن کاملDistributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud
While high-level data parallel frameworks, like MapReduce, simplify the design and implementation of large-scale data processing systems, they do not naturally or efficiently support many important data mining and machine learning algorithms and can lead to inefficient learning systems. To help fill this critical void, we introduced the GraphLab abstraction which naturally expresses asynchronou...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015